-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for OpenEphemeral bytecode #768
base: main
Are you sure you want to change the base?
Conversation
bfa0fd6
to
c913f04
Compare
Hey @diegoreis42, looks very sensible to me! |
core/vdbe/mod.rs
Outdated
@@ -260,12 +269,19 @@ impl ProgramState { | |||
} | |||
|
|||
macro_rules! must_be_btree_cursor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This macro will probably have to be renamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must_be_general_cursor
maybe?
91a802f
to
a0a2494
Compare
a0a2494
to
7aa4e56
Compare
core/vdbe/mod.rs
Outdated
cursors | ||
.get_mut(*cursor_id) | ||
.unwrap() | ||
.replace(Cursor::new_ephemeral(EphemeralCursor::new())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure what should be returned here.
bbe2896
to
ed66132
Compare
8cd2e82
to
9e310c7
Compare
core/ephemeral.rs
Outdated
use crate::{types::OwnedValue, Result}; | ||
|
||
pub struct EphemeralCursor { | ||
table: Rc<RefCell<EphemeralTable>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be a Rc<RefCell<T>>
right? Just table: EphemeralTable
core/ephemeral.rs
Outdated
} | ||
} | ||
SeekKey::IndexKey(index_key) => { | ||
// Seek by index key (ignoring row ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should support the key being either a rowid
or an arbitrary key. From SQLite docs:
Open a new cursor P1 to a transient table. The cursor is always opened read/write even if the main database is read-only. The ephemeral table is deleted automatically when the cursor is closed.
If the cursor P1 is already opened on an ephemeral table, the table is cleared (all content is erased).
P2 is the number of columns in the ephemeral table. The cursor points to a BTree table if P4==0 and to a BTree index if P4 is not 0. If P4 is not NULL, it points to a KeyInfo structure that defines the format of keys in the index.
So I think EphemeralTable
should be something like
pub struct EphemeralTable<Key: Ord> {
pub rows: BTreeMap<Key, Vec<OwnedValue>>,
pub next_rowid: u64, // generate rowids
pub columns: Vec<Column>, // columns
}
OR define it as an enum:
enum Ephemeral {
Table(EphemeralTable)
Index(EphemeralIndex)
}
and have separate definitions.
Either way, searching with an index key should be O(log n)
in an ephemeral index, instead of O(n)
as here.
baf3a0c
to
e5a4212
Compare
Take the logical AND of the values in registers P1 and P2 and write the result into register P3. If either P1 or P2 is 0 (false) then the result is 0 even if the other input is NULL. A NULL and true or two NULLs give a NULL output.
Take the logical OR of the values in register P1 and P2 and store the answer in register P3. If either P1 or P2 is nonzero (true) then the result is 1 (true) even if the other input is NULL. A NULL and false or two NULLs give a NULL output.
e5a4212
to
21552c0
Compare
@diegoreis42 any idea when you plan to merge this in? Having this for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I think there's still a few issues that need addressing in some way. I left some suggestions which may or may not make sense :)
// No matching record found | ||
self.rowid = None; | ||
self.current = None; | ||
self.null_flag = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null_flag
is an outer join specific concept and shouldn't be set here. it is set on when a LEFT JOIN condition is not met. imagine the following:
CREATE TABLE a(foo);
CREATE TABLE b(foo);
INSERT INTO a VALUES (1);
INSERT INTO b VALUES (2);
SELECT a.foo, b.foo FROM a LEFT JOIN b on a.foo = b.foo;
1|NULL
the null flag
is set on for the cursor of table b
in the example so that when you call Rowid
or Column
on that cursor, you always get NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I didn't knew what is the purpose of null_flag, so I just copied, this makes much more sense.
unreachable!("index seek key should be a record"); | ||
}; | ||
|
||
let search_key = index_key.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably make the key type of EphemeralIndex
to be just Vec<OwnedValue>
so we dont have to do so much cloning here and below, we can just pass a slice to the .range()
calls like rows.range(index_key.values[..last]..)
|
||
let rows = &index.rows; | ||
let mut range = match op { | ||
SeekOp::EQ => rows.range(trimmed_key.clone()..=trimmed_key.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EQ doesnt need to use range, right?
}; | ||
|
||
// this is an obvious makeshift but I didn't find any better way to do it | ||
if matches!(op, SeekOp::GT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use rows.range(Bound::Excluded(val), Bound::Unbounded)
(https://doc.rust-lang.org/beta/std/ops/enum.Bound.html) to avoid this
if let Some(record) = range.next() { | ||
let rowid = match record.values.last() { | ||
Some(OwnedValue::Integer(rowid)) => *rowid as u64, | ||
_ => unreachable!("index records should have an integer rowid"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...as the last element"
) -> Result<CursorResult<()>> { | ||
match &mut self.source { | ||
Ephemeral::Table(table) => { | ||
let rowid = if moved_before { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the significance of moved_before
here? struggling a bit to see how this would be used.
|
||
index | ||
.rows | ||
.retain(|r| !r.values.contains(&OwnedValue::Integer(*rowid))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several problems with this:
- This scans the entire btree on every insert
- For every element, it scans all the values
- It might even delete false positives, what if there's an unrelated integer element that happens to match the rowid?
What are we trying to do here? Isn't record
already supposed to contain the rowid
as the last element, and insertion should fail if the same full key already exists?
|
||
self.rowid = Some(rowid); | ||
self.current = Some(record.clone()); | ||
self.null_flag = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i already mentioned about null_flag
but just highlighting this as well
} | ||
} else if let Some(current_rowid) = self.rowid { | ||
if let Some((&next_rowid, row_data)) = | ||
table.rows.range((current_rowid + 1)..).next() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is that we're doing an O(log n)
seek on every call to next()
which makes a table scan of the ephemeral table have a complexity of O(n log n)
.
I'm not sure what would be the best way to implement this actually, perhaps we need to store the Range
and do something like this:
struct EphemeralTable<'a> {
set: BTreeSet<usize>,
iterator: RefCell<Option<std::collections::btree_set::Range<'a, usize>>>,
current: RefCell<Option<usize>>,
}
impl<'a> EphemeralTable<'a> {
fn new() -> Self {
Self { set: BTreeSet::new(), iterator: RefCell::new(None), current: RefCell::new(None) }
}
fn next(&'a self) {
if self.iterator.borrow().is_none() {
*self.iterator.borrow_mut() = Some(self.set.range(..));
}
*self.current.borrow_mut() = self.iterator.borrow_mut().as_mut().unwrap().next().cloned();
}
fn prev(&'a self) {
if self.iterator.borrow().is_none() {
*self.iterator.borrow_mut() = Some(self.set.range(..));
}
*self.current.borrow_mut() = self.iterator.borrow_mut().as_mut().unwrap().next_back().cloned();
}
fn seek(&'a self, value: usize) {
let range = self.set.range(value..);
self.iterator.borrow_mut().replace(range);
*self.current.borrow_mut() = self.iterator.borrow_mut().as_mut().unwrap().next().cloned();
}
fn insert(&mut self, value: usize) {
self.set.insert(value);
}
fn current(&self) -> Option<usize> {
self.current.borrow().clone()
}
}
@@ -316,6 +340,11 @@ impl VTabOpaqueCursor { | |||
} | |||
} | |||
|
|||
pub enum GeneralCursor<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh both of these are BTree cursors, one of them just uses an in memory BTreeMap/BTreeSet and is not persistent. wonder if we should call this
pub enum GeneralBTreeCursor {
Persistent(&'a mut BTreeCursor),
Ephemeral(&'a mut EphemeralCursor)
}
also, both the persistent and ephemeral cursors share a lot of the same methods and now there's a lot of pattern matching duplication in various opcodes, so wondering if it would make sense to just do it like this:
impl <'a> GeneralBTreeCursor<'a> {
fn rewind(...) {
match self {
Self::Persistent(p) => c.rewind(...)),
Self::Ephemeral(e) => e.rewind(...)),
}
}
}
I'm quite busy at my work this week, so probably I'll address this issue this weekend, but I'll do my best to start sooner. |
No worries! I can always raise another PR to augment |
First step to close #741. It allows creating temporary table operations and improve flexibility in handling in-memory data. Since is my first time working on core code, every feedback is welcome. @jussisaurio @pereman2 @PThorpe92 @penberg